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ABSTRACT 

Bipolar disorder and schizophrenia overlap in 
symptoms and may share some underlying neural 
behavior. The discrimination between the two 
diseases is one of the problems that face psychiatric 
experts. The present work will suggest a suitable 
solution to this problem based on artificial methods. 
The support vector machine (SVM) is used for 
discrimination dependent on recording of the EEG 
rhythms for patient. The large set of signals included 
in the EEG rhythms is reduced into smaller set after 
Fast Fourier Transform (FFT) segmentation. 
Different kernels are applied on the SVM which are 
linear, polynomial, quadratic and radial basis 
function. The application of SVM with different 
kernels for the EEG discrimination of the patients 
suffering schizophrenia and bipolar diseases is the 
aim of this work. Analysis of results have shown that 
the suggested algorithms will solve the discrimination 
problem between the two diseases. This is will be done 
using EEG waves and the support vector machine 
with linear and quadratic kernels, which have 
achieved a high performance rate reaching 98 % and 
97.667% respectively compared to the other kernels 
applications. 

Keywords — EEG, schizophrenia, Bipolar Disorder, 
support vector machine (SVM). 

I. Introduction 

Schizophrenia (SZ) and bipolar disorder (BD) are two 
common psychiatric illnesses that share significant 
overlapping symptoms [1] such as cognitive features [2], 
genetic risk [3], and medication response [4]. The 
neurobiological underpinning of these disorders might 
provide basis for understanding their similarities and 
differences. Evidence suggests that SZ and psychotic 
bipolar disorder are strongly heritable [5]. So, an effective 
classification method is required to distinguish between 
SZ and BD patients in order to apply the right treatment 
to patient [6]. 

Recent studies [7] apply neuroimaging methods for 
diagnosis of schizophrenia and bipolar disorders to reveal 
discrete patterns of functional and structural 
abnormalities in neural systems which are critical for 
emotion regulation, mean while some other research 


works are employing traditional statistical methods that 
rely on the basic assumption of linear combinations but 
not appropriate for such tasks [8]. 

Classification is considered as a useful tool for medical 
diagnosis [9]. Fundamentally, classification approach 
could be established by medical experts to enable better 
understanding of diagnosis. Recent research studies 
contributed to the classification of diseases using 
techniques such as expert systems, artificial neural 
networks and SVM [10]. 

In this present work, an automated machine learning 
procedure that can diagnose specific forms of psychiatric 
illness using EEG of non-excited (without stimulation) 
patient’s suggested. Classification of psychiatric 
disorders using support vector machine with different 
kernels among three classes of diagnostic illness: SCZ, 
BD and healthy (N) are explained. 

This research work presented as follows: In Section (II) 
subjects and methods are given. Support vector machine 
as a classifier is discussed in section (III). In Section (IV) 
the results and discussion are provided. Finally, 
conclusions is remarked. 

II. SUBJECTS AND METHODS 

A. Subjects 

The EEG data were obtained from Abou-Elazayem 
Psychiatric Hospital in Egypt [11]. The subjects included 
70 healthy persons who have no history of neurological 
or psychiatric disease, 80 schizophrenic and 80 bipolar 
disorder patients. All patients were hospitalized and 
diagnosed with schizophrenia or bipolar according to the 
criteria of diagnostic and statistical manual of mental 
disorders [12] by independent psychiatrists. 

In an acoustically and electrically shielded room where 
the subjects were seated comfortably in a reclining chair, 
the EEG data were obtained from 16 surface electrodes 
placed on the scalps according to the standard 
international 10/20 system, namely the 16 channels, Fpl, 
Fp2, F3, F4, F7, F8, C3, C4, P3, P4, T3, T4, T5, T6, 01, 
02 with reference to linked earlobes. The digitization of 
16 channels EEG was performed with a sampling rate of 
50 Hz using a 12 bit AD-converter and the data were 
recorded on a hard disk. For each subject, recordings 
covered the EEG activity in a resting condition (without 
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stimulation) for time ranging approximately from 3 to 5 
minutes. 

B. Feature Extraction 

Feature selection is a very important step in pattern 
recognition. The idea of feature selection is to choose a 
subset of features that improve the performance of the 
classifier especially when dealing with high dimension 
data. Finding significant features that produce higher 
classification accuracy is an important issue. 

The main steps for vector feature extraction of each 
subject are the EEGLAB toolbox [13], which is used to 
filter the original spontaneous EEG time series and 
remove the artifacts. A size of clear 2000 time-samples is 
selected which represent a 20 second intervals from each 
subjects. Fast forier transform (FFT) for 2000 points of 
all-time series obtained from 16 channels are calculated. 
The 2000 points of each channel are partitioned into 8 
intervals and the mean value of each interval is calculated 
leading to 8 points. So, for each subject, there is 8 points 
for each one of the 16 channels which results in a single 
128 feature vector for each subject and these will be used 
as input to the SVM for classification [14]. 

III. Support Vector Machine Algorithms 

Support vector machine techniques [15] can be classified 
into three types namely, linearly separable, linearly 
inseparable and non-linearly separable. 

A. Linearly Separable SVM 

Linearly Separable classification separates the high 
dimensional data into two groups {+1, -1} without any 
overlapping or misclassification (Fig. 1). SVM produces a 
number of decision margins where the best margin is 
identified by using perceptron algorithm. 



The main objective of SVM is maximizing the margin 
width in order to reduce the misclassification error. The 
margin width can be calculated by drawing a line (AC) 
between HI and H2 and forming a triangle ABC. 

The distance between two hyperplanes is measured by 
calculating the length of AC. 


AC 


2 

l|w|| 


The optimal hyperplane is given by the equation, 


w 1 .x 1 + w 2 .x 2 — b — 0 


The hyperplanes and H 2 are represented by 

w 1 .x 1 + w 2 .x 2 — b = 1 
w 1 .x 1 + w 2 .x 2 — b — —1 


where w 1 , w 2 define the positions of hyperplanes Hi and 
H 2 respectively. x 1 , x 2 are data points and b takes value 
of +1, 0, - 1 which shows how far the hyperplanes are 
away from the original line. 

2 

The maximum margin width is and minimum margin 

1 ii 

width is - w under the constraint 

2 11 11 


yt(w 1 .x 1 + w 2 .x 2 - b) > 1 for i = 1,2,..,m (5) 

The parameters such as weight vector (w), bias (h), 
number of support vectors (m) are essential for 
classification. Margin width can be calculated from the 
values of ‘w’ and ‘h’ using optimization methods, which 
called primal problem. If the values of the parameters 
have larger values then it is difficult to calculate these 
parameters by primal method. This problem is considered 
as dual problem. Meanwhile these values can be 
calculated using Lagrange multipliers as follow: 
Lagrangian multiplier for primal problem is given by 


L(w,b,a) = [~ II iv II 2 - £?=i octiyiW- x t + ]- 1 



where a t is the Lagrangian Multiplier. 

When applying the derivative of L with respect to w and 
b to zero, get 

L (w, b, a) = w — Yi=i a iX(yi — 0 

This implies that, 

Td=i ViXtyi (7) 

Apply from equation (7) into (6), get 

L(w,b,a ) = Ta^ailjZj^aiajXiXjyiyj 
The associated Dual form is given by 
maxima) = £?=i a t — -Yij=i ^iOCjXiXjyi yj subject to 

Yi=i a i Vi — 0. The optimum separating hyperplane 
(OSH) can be calculated by quadratic programming (QP). 


B. Linearly Lnseparable SVM 

SVM for data contain noisy and faulty information which 
having possibilities of some error rate, it's impossible to 
construct a linear hyperplane without error for binary 
classification data as shown in Fig.2. Linearly inseparable 
classification can produce solutions for high dimensional 
data sets with overlapped or misclassified data. Slack 
variable £ is used to represent the error term with slight 
modification in constraint (5) and allow misclassified 
points. 

wxi-b>+l-fi for y t = +1 — where & > 0 Vi (8) 
w ■ x t - b < -1 + <fj for y t = -1 + <fj where £ > 0 Vi (9) 
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Combine (5) and (6) get, 

y t (w ■ x t + b ) > & - 1 , > 0 Vi (10) 



Some data may be incorrectly classified and need to 
adjust the hyperplanes for proper classification. 
Increasing the margin on either side leads to increase in 
misclassification error rate. It can be minimized by the 
following function: 

n 

MiniIMI 2 + C yy 

i =1 

Where C is the constant to compromise between the size 
of the margin and <f. In Figure.3, Data points will be 
correctly classified if ai and are equal to zero or ai is 

equal to zero and value is less than one. Data points 

will be misclassified, if a i is equal to C and is greater 
than one. Data points are consider as Support Vectors, if 
a t value must be greater than 0, and is equal to zero. 
Lagrangian formulation for dual form 

Q(w,b,%,p) = ^||w|| 2 + C 

-£?=!<*£ [y;0 ■*£+&)- i + ?1- Pi X?=i C (ii) 
The negative sign is used because the objective is to 

maximize margin width with respect to a £ , |3j and 
minimize with respect to w, b and 

Differentiating with respect to w, we get 

w =Y,cc i y i x i ,£a £ y £ = 0 

And 

C - a t - ^ = 0 

Substitute in (11), 

max L(a ) = \l2,j =i a i a j ytfj i x i ■ x j ) + (12) 

with constraint 

X «£ yi = 0, 0 < a £ < c 
we can get the value of w, b and 

C. Non Linearly Separable SVM 

Data that are not linearly separable can be converted into 
higher dimensional mapping for classification (Fig.3). 
The nonlinear mapping of original sample data which is 
transformed into higher dimensional mapping and is 
called Feature Mapping and its mapping function is 


denoted as (piped- I n this case, Kernel functions are used 
to find the value of mapping function <p(x £ ). 

Xj T Xj = k(x i; Xj) = cp(Xi) T cp(Xj)k (xj ; Xj ) is called 
the Kernel function which is based on the inner product 
of two varients Xj, Xj. In original space dot product of Xj, 

Xj is used for calculation and it is converted into higher 

space dot which can be replaced by dot product as kernel 
function [16]. 



Some of the popular Kernel functions used in this work 
are: 

Radial Kernel Function (RBF) 

_ 1 r x ~ ^ 2 
k i Xi , Xj) = e 2^ a J 

Linear Kernel Function 

If ( 'Y • Y * I — Y T Y ■ 
fV ^ A j f Aj J A j Ay 

Polynomial Kernel Function 

k(xi, xj)= [(xj xj) + l] d 

In the following example, SVM with linear kernel for 
finding optimal hyperplane of non-separable patterns is 
discussed. 

Assume two classes of data to be classified using SVM 
with linear kernel. Each class consists of only one point. 
These points are: 

Xi =A ± = (1,1), *2 =B ± = (2,2) (1) 

From SVM theory, we have the following two equations: 

/(w)=i||w|| 2 (2) 

3 i(w,b) = y £ [(w ,Xi) + b] - 1 > 0 (3) 

We can expand g t iw , b ) to be: 

gi(w > b) = (w ± x ±1 + w 2 x 12 + b) — 1 > 0 (4) 

g 2 (w ,b) = ~(w 1 x 21 + w 2 x 22 + b) - 1 > 0 (5) 

Next, we put the equations into the form of Lagrangian: 

L(w ,b,a) = f(w )- ^(w^)- oc 2 g 2 (w ,b) 

1 - 9 

L(w ,b,a) =- 1| w|| 2 - a ± (w ± x ±1 + w 2 x 12 + b - 1) 

- a 2 (-(Wi x 21 + w 2 x 22 + b)) - 1) 

1 

L(w ,b,a) =- 1|w|| 2 - a ± ([w ± x 1± + w 2 x 12 + b\ - 1) 

+ a 2 (w ± x 21 + w 2 x 22 + b + 1) (6) 

Solving the equations (8) to (12) obtained from the 
gradient of the following Lagrangian 
VL(w,b,a)= Vf (w) — Va 1 g 1 (w,b)-Va 2 g 2 (w,b) = 0 (7) 
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Which are 

5 


Sw t 

8 

SWn 


L(w ,b,a) = w 1 — a ± x lt + a 2 x 21 = 0 
L(w ,b,a ) — W 2 — a 1 x 12 + cc 1 x 22 — 0 


8 

— L(w,b,a) = — a 1 + a 2 — 0 


s r 


_ 8 _ 

8 ex.'? 


L(w ,b,a ) = [w 1 x ±1 -1- w 2 x 12 -1- b] — 1 = 0 
L(w ,b,a ) = [w ± x 21 + w 2 x 22 + b] + 1 = 0 


( 8 ) 

(9) 

( 10 ) 

( 11 ) 

( 12 ) 


These equations are enough to find analytically the values 
of w , b and a analytically. Equating equations (11) and 
(12) we get: 

[w ± x ±1 -1- w 2 x 12 -1- b] — 1 = [w ± x 21 -1- w 2 x 22 + b] - 1-1 = 0 

W 1 X 11 + W 2 X 12 — 1 = W 1 X 21 + W 2 X 22 + 1 
[w t x ±1 + w 2 x 12 ] - [w ± x 21 + w 2 x 22 ] = 2 
\w 1 + w 2 ] — [2w 1 + 2w 2 ] = 2 


— w 1 — w 2 = 2 


we get 


>w ± = — (w 2 + 2) 


(13) 


Substituting from equation (13) into equation (8) and 
combining with equation (9) gives the following: 


W{ — a 1 + 2 a 2 = 0 
— cc 1 +2 a 2 =0 
a 1 = n 2 


(14) 

Which produce the next: 

w 1 -\- a ± — 0 (15) 

w 2 + = 0 (16) 

Equating equations (15) and (16) and putting the result 
back into Equation (13) this gives the following: 

w 1 = w 2 = - 1 (17) 

Using equation (17) in either of equations (15) or (16) 
will give: 

a 1 = a 2 = 1 ( 18 ) 

Finally, using equation (18) in equations (11) and (12) 
gives: 

b = 1 — (w x x ±1 + w 2 x 12 ) = 1 — (—1 — 1) = 3 
b = -1 - (w x x 21 + w 2 x 22 ) = -1 - (-2 - 2) = 3 (19) 

Note that this result also satisfying all of the conditions 
given in equation (3): 

a i(yd(w ,*i) + b] — l) = o 

i.e. 

a 1 (w 1 x u + w 2 x 12 + b) — 1 = 0 
([-1 — 1 + 3] — 1) = —2 + 3 — 1= 0 
a 2 [w ± x 21 +w 2 x 22 + b]~ 1 = 0 
([-2 — 2 + 3] + l) = —4 + 3 + 1 = 0 

which the inequality constraints: 

> 0 , a 2 >0 (20) 


IV. RESULTS 

The Support vector machine with different kernels is 
programmed with matlab. MATLAB implementation of 
the feature vectors contained in a file called M svm train” 


is used for training SVM classifier. Four different kernels 
are used to train the SVM model, linear, polynomial, 
quadratic and Gaussian radial basis functions. The 
number of samples used is 230 EEG power spectra for all 
cases. 80 different samples have been used as training 
data for the classifier while 150 different samples (50 
from healthy (Normal), 50 from schizophrenic patients 
and 50 from bipolar disorder patients) have been used for 
testing. 

The accuracy of the classifier is tested after training the 
SVM model. The function used for the classification 
process by MATLAB is ”SVM-Test”. Each of the 
different SVM kernel functions (linear, polynomial, 
quadratic and Gaussian RBF) are applied on the ”SVM- 
Train” function independently. 

Tables (1) to (4) show the performance of the SVM 
classifier when applying the four different kernels. 
According to the presented the tables it is found that 
linear SVM kernel the best performed with average 
classification accuracy of 98% with respect to the number 
of correctly classified examples. This is followed by 
quadratic, polynomial and RBF with a classification 
accuracy of 97.33%, 92% and 72.67 % respectively. 


Table 1: Performance of Quadratic SVM kernel for Classification of 
EEG Power Spectra from the three classes N , SCZ, BD 



Tested 

Data 

Number of 

correct 

sample 

Number of 

incorrect 

sample 

Percentage of 
Recognition 

Normal 

50 

50 

0 

100.00% 

Bipolar 

50 

48 

2 

96% 

schizophrenia 

50 

48 

2 

96% 

Total 

150 

146 

4 

97.333% 


Table 2: Performance of Linear SVM kernel for Classification of EEG 
Power Spectra . 



Tested 

Data 

Number of 

correct 

sample 

Number of 

incorrect 

sample 

Percentage of 
Recognition 

Normal 

50 

50 

0 

100.00% 

Bipolar 

50 

49 

1 

98% 

schizophrenia 

50 

48 

2 

96% 

Total 

150 

147 

3 

98% 
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Table 3: Performance of Polynomial SVM kernel Classification of 
EEG Power Spectra. 



Tested 

Data 

Number of 

correct 

sample 

Number of 

incorrect 

sample 

Percentage of 
Recognition 

Normal 

50 

48 

2 

96% 

Bipolar 

50 

47 

3 

94% 

schizophrenia 

50 

43 

7 

86% 

Total 

150 

138 

12 

92% 


Table 4: Performance of RBF SVM kernel Classification of EEG Power 
Spectra. 



Tested 

Data 

Number of 

correct 

sample 

Number of 

incorrect 

sample 

Percentage of 
Recognition 

Normal 

50 

45 

5 

90% 

Bipolar 

50 

37 

13 

74% 

schizophrenia 

50 

28 

22 

56% 

Total 

150 

109 

41 

72.67% 


V. CONCLUSION 

The discrimination of Schizophrenia and bipolar disorder 
is a significant problem which requires the use of strong 
optimizing algorithms. Features extracted from EEG of 
230 subjects (70normal, 80 schizophrenic patients and 80 
bipolar patients) are used. By applying support vector 
machine using four different kernel functions (linear, 
polynomial, quadratic, radial basis) to 230 subjects, the 
experimental results have shown that the proposed 
algorithms can solve the discrimination problem using 
EEG rhythms and the support vector machine where 
linear and quadratic kernels have achieved a high 
performance rate equal to 98 % and 97.667% respectively 
compared to the two other kernels. 
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